11th May 2021

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.1     ✓ dplyr   1.0.5
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## here() starts at /cloud/project
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   term = col_character(),
##   estimate = col_double(),
##   std.error = col_double(),
##   statistic = col_double(),
##   p.value = col_double()
## )

Outline

Introduction

  • Presentation of data
  • Data wrangling

Materials and Methods

  • Data visualization
  • Logistic regression
  • Principal Component Analysis
  • K-means clustering

Results and discussion

Introduction

Introduction

Introduction

  • Byar & Greene prostate cancer data, from Andrews DF and Herzberg AM (1985)
  • Compare four different treatments
  • 502 observations of 18 variables
  • 27 NA values

Introduction

Variables in the data set

Presentation of the variables with a R output, table, or image?

Data wrangling

<<<<<<< HEAD

Raw data -> Clean data

  • Exclude dtime, sdate and sg
  • Renaming

Clean data -> Augment data

  • Add five new variables: outcome, treatment_mg, EKG_lvl, performance_lvl, age_group
=======

Raw data -> Clean data

  • Exclude dtime, sdate and sg
  • Renaming

Clean data -> Augment data

  • Add five new variables: outcome, treatment_mg, EKG_lvl, performance_lvl, age_group
>>>>>>> 89e7989b5531ce9cb181cab60b0094242f90b6ba

Materials and Methods

Data visualization

Pre-treatment variables - Numeric

<<<<<<< HEAD

Data visualization

Pre-treatment variables - Categorical

Data visualization

Pre-treatment variables - Heatmap

Data visualization

Treatment, outcome and age

=======

Data visualization

Pre-treatment variables - Categorical

Data visualization

Pre-treatment variables - Heatmap

Data visualization

Treatment, outcome and age

>>>>>>> 89e7989b5531ce9cb181cab60b0094242f90b6ba

Logistic regression

Logistic regression

Model outcome as function of treatment

<<<<<<< HEAD =======

Output:

log_mod_treatment
## # A tibble: 4 x 5
##   term            estimate std.error statistic     p.value
##   <chr>              <dbl>     <dbl>     <dbl>       <dbl>
## 1 (Intercept)       1.11       0.211     5.27  0.000000136
## 2 treatment_mg0.2   0.0907     0.301     0.301 0.763      
## 3 treatment_mg1    -0.807      0.280    -2.88  0.00394    
## 4 treatment_mg5    -0.0536     0.294    -0.182 0.855

Logistic regression for each variable

Treatment 1.0 mg

Logistic regression

Effects of significant variables for treatment 1.0 mg

Logistic regression

Distribution of significant variables for each outcome

>>>>>>> 89e7989b5531ce9cb181cab60b0094242f90b6ba

Output:

<<<<<<< HEAD
log_mod_treatment
## # A tibble: 4 x 5
##   term            estimate std.error statistic     p.value
##   <chr>              <dbl>     <dbl>     <dbl>       <dbl>
## 1 (Intercept)       1.11       0.211     5.27  0.000000136
## 2 treatment_mg0.2   0.0907     0.301     0.301 0.763      
## 3 treatment_mg1    -0.807      0.280    -2.88  0.00394    
## 4 treatment_mg5    -0.0536     0.294    -0.182 0.855

Logistic regression for each variable

Treatment 1.0 mg

Logistic regression

Effects of significant variables for treatment 1.0 mg

Logistic regression

Distribution of significant variables for each outcome

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

K-means clustering

Results and discussion

=======

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

K-means clustering

K-means clustering

Results and discussion

>>>>>>> 89e7989b5531ce9cb181cab60b0094242f90b6ba
  • Stage 3 and 4 patients differ in tumor size and acid phosphatase levels
  • Most effective treatment is 1.0 mg estrogen
  • Significant variables are tumor size, CVD, age, and weight index